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Abstract 

The Activation-Relaxation Technique nouveau (ARTn) is an eigenvector following method for 
systematic search of saddle points and transition pathways on a given potential energy surface. 
We propose a variation of this method aiming at improving the efficiency of the local convergence 
close to the saddle point. We prove the convergence and robustness of this new algorithm. The 
efficiency of the method is tested in the case of point defects in body centered cubic iron. 



1 



I. INTRODUCTION 



The Activation-Relaxation Technique (ART^^^^l is a powerful method for searching 
saddle points and transition pathways of a given potential energy surface (PES). Search 
methods for saddle points and transition pathways can actually be classified in two main 
categories. In the first class of methods, one assumes that two local minima of the PES 
are known. The main objective of the methods in this class is to find the minimum energy 
path to go from one local minimum to the other one. The Replica Chain method^, the 
Nudged Elastic Band^^ 1 ^, the String method^ 1 ^, the Transition Path Sampling^^^^ 
and the Discrete Path Sampling^ are some methods belonging to this class (note that the 
Nudged Elastic Band method has been generalized to the finite temperature setting^, as 
well as the String method^). In the second class of methods, one assumes that only one 
local minimum of the PES is known. The aim of methods in this class is to find a saddle 
point of the PES, from which the exploration will be pursued toward a different local mini- 
mum, yielding a transition path. Probably the first method in that class is the Eigenvector 
Following method^. The Dimer method^, the Conformational Flooding method^, the 
Hyperdynamics methocP^ 1 ^ 7 -, the Parallel Replica method 28 , the Temperature Accelerated 
methocP^ 1 ^, the Scaled Hypersphere Search method^ 1 ^ 1 ^, are other examples. In this ar- 
ticle, we study the Activation-Relaxation Technique, which belongs to this second class. We 
focus here on the zero temperature case, the so-called ART nouveau (ARTn) methocr^ 1 ^, 
and do not consider the finite temperature case, the so-called POP-ART method^. 

The ART method is composed of two main steps, the activation step and the relaxation 
step. The activation step consists in moving the system from a local minimum to a saddle 
point. The relaxation step consists in relaxing the system, from the computed saddle point, 
to another local minimum. Of course, this relaxation step is very fast (and easy to perform) 
in comparison with the activation step. 

The activation step itself can be divided into two substeps. The first substep aims at 
finding some region of the PES with one direction of negative curvature, which hopefully 
contains a first order saddle point, and that we will call the "attracting region". The 
basic idea for finding a point on the PES with one direction of negative curvature is to 
choose a random vector r, and next to repeat the two following operations: (i) move the 
system according to r, (ii) relax the system in the hyperplane orthogonal to r, until a point 

2 



with one direction of negative curvature has been found (see section 3 for details). The 
second substep consists in finding a saddle point in the reached attracting region. From 
a numerical viewpoint, these two substeps are of very different nature. In this article, 
we focus on the second substep, namely the local convergence to a saddle point, starting 
from a configuration with one direction of negative curvature. In section [III we present 
a simple, prototypical, ART-like algorithm, which has better local convergence properties 
than existing ones. Loosely speaking, this algorithm is optimal in the principal direction of 
negative curvature, but suboptimal in the transverse directions. This is why this algorithm 
has to be considered as a prototype, on the basis of which more complex numerical strategies 
can be elaborated. Some numerical results are reported on in section IIII[ where we consider 
the problem of vacancy diffusion in crystalline materials. The numerical results obtained 
on this problem demonstrate the efficiency of our approach. We gather in the Appendix a 
convergence and robustness analysis of this new algorithm. 

II. A NEW TYPE OF ART-LIKE ALGORITHMS 

From a mathematical viewpoint, a PES for an isolated molecular system with N atoms 
is a function E : M? N /G r — > K, N being the number of atoms in the system and G r = 
R 3 x SO (3) the group of rigid body movements which act on R 3JV in the following way: for 
all g = (x , R) G G r = R 3 x SO(3), and for all X = (x\ ■ • • , x N ) G R 3JV , 

g ■ X = (R(x! - x ), ■ ■ ■ , R(x N - x )). 

This viewpoint takes into account the fact that the potential energy E(X) of the system is 
invariant upon rigid body movements. In the simulation of the condensed phase, artificial 
periodic boundary conditions are usually introduced. In this case, the system is translation 
invariant, but not rotation invariant, and a PES then has to be regarded as a function 
E : T 3JV /R 3 — > R, where T m is a 3 A dimensional torus. 

For our purpose, namely for the analysis of ART-like methods, there is no restriction 
in assuming that the PES under consideration is a function / : R d — > M. with isolated 
critical points. For x G R , we denote by V/(x) the gradient of / at the point x and by 
H(x) = V 2 f(x) the hessian of / at the point x. For x G M. d , let Ai(x) < \2(x) < • • • < Xd{x) 
be the eigenvalues of H(x) counted with their multiplicity, and let • • ■ , Vd(x)) be an 
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orthonormal basis of associated eigenvectors. 



22, the ART 



Contrarily to second order methods, such as the one proposed in Ref. 
method does not rely on a complete knowledge of the spectral decomposition of the Hessian 
matrix. Instead, it only makes use of the direction of negative curvature. We consider here 
various modifications of the ART method, differing from the original ART algorithm by the 
fact that they also make use of the associated eigenvalue (i.e. of the curvature itself). A 
prototype of such algorithm reads 

X k+ 1 =X h - pr V 1 {x k ) - fi t n vi{xk) ±Vf{Xk), (1) 

mm{Ai{Xk), —a c ) 

where A c > and \x t > are fixed numerical parameters, and H Vl r Xk \x = I — (vi(xk), -)vi(xk) 
is the orthogonal projector on the hyperplane Vi(xk) ± - 

In order to clarify the behavior of the algorithm ([T]) and the role of the numerical pa- 
rameters A c > and fit > 0, let us consider the simple example of a quadratic function 
/: 

/(x) = ^5>M 2 , (2) 

3=1 

with x = (x 1 , ■ ■ ■ , x d ) and A x < A 2 < • • • < A^. In this simple case, (QQ) reads as a system of 
d decoupled scalar equations 

4+i = (1 - Vt\) x{, 2<j<d, 



yielding 

Ai_ 

min(Ai, -A c ) 

where x$ is the initial guess of the algorithm. 

Assume that all the Aj are different from zero. In this case, / has a unique critical point 
(the origin), and the algorithm converges to this critical point for all choices of the initial 
guess if and only if 

Ai < and < A,- < 2/i^ 1 for all 2 < j < d. 



This means that if the algorithm converges, it will be toward a critical point with Morse 
index equal to one (a first order saddle point). Conversely, if is a saddle point with Morse 
index equal to one (i.e. if Ai < < A 2 < • • • < Xd), the algorithm will converge to zero if and 
only if Xd < 2fi^ x . The numerical parameter \i t controls the convergence in the hyperplane 
x 1 = 0. If fi t is too small, convergence will be slow, if \x t is too large, the algorithm will be 
unstable. Note that if Ai < — A c , convergence in the e\ direction (the direction of negative 
curvature) will be obtained in a single iteration, while linear convergence will be observed 
if — A c < Ai < 0. The role of the parameter A c is to prevent the algorithm, when applied to 
a non-quadratic energy landscape, from becoming unstable in the region where |Ai(xfc)| is 
small. 

Let us now come back to the case of practical interest when / is the PES of some 
molecular system. As mentioned in the introduction, we focus here on the local convergence 
properties and henceforth assume that the iterates have reached the neighborhood of a 
first order saddle point. One can then prove (see the Appendix) that the algorithm (pQ) 
converges to the saddle point, quadratically in the principal direction of negative curvature, 
and linearly in the perpendicular directions. Let us note that quadratic convergence is 
obtained under the assumption that the smallest eigenvalue X\{x^) of the hessian matrix 
H(x k ) and the corresponding eigenvector Vi(xk) are computed exactly. However, a key 
ingredient in ART-like algorithms is that Xx(x k ) and V\{x k ) are computed approximately, by 
iterative methods. Thus, for instance, the eigenelement {Xi{x k ),Vi(x k )) can be computed 
by Lanczos or Arnoldi methods, which are based on repeated evaluations of matrix-vector 
products of the form H(xk) v. In turn, such matrix-vector products can be approximately 
computed using a finite-difference formula, such as the first-order formula 

H{x h ) t; « i (V/(x fc + ev) - V/(x fc )) (3) 

or the second order formula 

H(x k ) vn^ (Wf(x k + ev) - Vf(x k - ev)) . (4) 

In summary, the eigenelement (Xi(x k ), V\(x k )) is computed approximately by repeated eval- 
uations of forces —Vf(y) for a collection of configurations y close to the reference configura- 
tion Xk- However, one can prove (see again the Appendix) that the algorithm ([!]) is robust, 
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in the sense that it can accomodate approximate evaluations of (\i(xk),vx(xk))- The price 
to pay is a lower convergence rate in the principal direction of negative curvature. 

The prototypical algorithm ([T]) is not far from being optimal in the direction of negative 
curvature, even in presence of numerical errors in the evaluation of (Ai(xfe), Vi(x^)). On the 
other hand, it is clearly suboptimal in the transverse directions, where it behaves as a basic 
fixed step-size gradient. Improvements can be obtained by resorting to conjugate gradient, 
quasi-Newton or trust-region methods in the transverse direction 3 ^. It is also possible, in 
principle, to take into account the p lowest eigenvalues of H(xk) obtained from the Lanczos 
or Arnoldi partial diagonalization procedure, to construct a surrogate function that will 
provide a better model for / in the neighborhood of x\.. Such improvements of the current 
ART-like algorithms will be considered in a future work. 



III. NUMERICAL RESULTS: MIGRATION OF POINT DEFECTS IN a-IRON 

In this section we discuss the practical implementation of algorithm ([T]) in the case of 
basic defects in a-iron: small self interstitial (SIA) and vacancy (VAC) clusters (1 to 3 
defects). The crystal of a-Fe is modeled by the EAM potential developed by Mendelev et 
alM^S- which has been the most widely used in recent years to study interstitial loops — 
Marinica et al. have previously used the same potential and the standard ARTn method to 
test and reveal the energy landscape of small interstitial clusters (1 to 4 self-interstitials) in 
a-Fe^ 3 -. It therefore gives us a good basis for comparison. The crystal consists of 1024±n 
atoms (n=l,2,3). 

Starting from a local minimum configuration, the first stage of the activation step is to 
push the system out of the basin. In order to do this, the system is slightly deformed using 

x k+1 = x k + fx A Ax (5) 

where Ax is a fixed normalised deformation of the system, which is for the moment chosen 
randomly, and fi A is a user-defined fixed step. Possibilities for well-chosen initial deforma- 
tions will be explored in future work. In this paper we use the defect centered deformation 43 
instead of global deformation. This means that the random deformation Ax is applied only 
on the atoms within a certain radius around the defect. The reason for this choice is that 
the efficiency of the algorithm in the defect centered deformation is independent of the size 
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of the system and provides the best rate of successful to unsuccessful activation processes 
(this will be elaborated on later on in the section). At each iteration the system is relaxed 
in the hyperplane orthogonal to the direction Ax. If, after this relaxation, the lowest eigen- 
value is still positive, we continue the deformation. As soon as \i(xk) becomes sufficiently 
negative (Ai(x&) < \d for some threshold < 0), we move onto the next stage of the ac- 
tivation process. The threshold is used in order not to be misguided by numerical errors of 
the eigenvalue calculation. The lowest eigenvalue is computed using the Lanczos algorithm 
with 15 iterations, a small number compared to the size of the Hessian matrix (recall that 
H(xk) G IR 3Ar><3Ar , where iV is the number of atoms in the system). 

Once the system is out of the basin, we begin to move the system towards the saddle 
point, in the hope of following the minimum-energy reaction path. The previously used 
method (slightly modified ARTrz)^ 3 - for this stage is: 

x k+1 =x k - ^=vx(x k ) (6) 

where [i a is a user-defined constant and 1/y/k ensures that the step size gets smaller as 
we approach the saddle point. The direction of the eigenvector v\ is chosen such that it 
points in the same direction as the force i.e. (— V/ (x^.), Ux(xfc)) > 0. This is then followed 
by a relaxation in the hyperplane, which is discussed in the next paragraph. Algorithm (jSj) 
was an improvement to some previous methods 6 . However it has several drawbacks. The 
constant parameter fi a in the algorithm needs to be defined according to the PES in study, 
and even so may be suited for some saddle point searches but not for others (very tightly 
positioned saddle points may force \x a to be small for the whole system, which may in turn 
impede results when the surface becomes relatively smooth). With V\{x^) unitary and \i a 
fixed, it is clear that decreasing the step size according to the number of iterations is not 
ideal. In fact it would be better to use the first and second derivative information of the 
energy surface. Taking as a simple example the function (jSJ) with d = 2 (solution x* at 
the origin), we can position ourselves at a point x n where the displacement from x* is in 
the direction of negative curvature. The further along this direction we are positioned, the 
more iterations would be needed to approach x* since algorithm (J6j) would take smaller and 
smaller steps. In this simple case the proposed algorithm (jTJ) would jump to the solution in 
one step. 

The minimization of the forces in the orthogonal hyperplane consists of following a fixed 
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step steepest descent method but with the force projected onto vi(xk) ± , until the forces in 
this plane are zero or a maximum number of steps M is reached. In the case where we reach 
a configuration x^ with X\(xk) > 0, we restart and depart from the current local minimum 
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for details 



and make a displacement in a different randomly chosen direction (see Ref. 
on relaxation). The number of minimization steps taken is limited due to the potentially 
costly force evaluations. The success of the ARTn method however relies strongly on good 
minimization in the hyperplane. Possible improvements including trust region and conjugate 
gradient methods will be explored in future work. 

The main contribution of algorithm (CQ) is the step taken in the direction of the negative 
curvature. In the numerical results reported later on in this section, we have implemented 
this step in the attracting region. On the other hand, we continue to use equation (J5J) for 
leaving the basin and the same algorithm as described in the previous paragraph for the 
relaxation in the hyperplane. 

The efficiency of these ART-type algorithms depends on two main points: the number 
of force evaluations required during the activation stage and the ratio of successful to un- 
successful searches. The failure to find a saddle point can be determined in several ways. 
If minimization in the hyperplane is not done sufficiently well, the system risks climbing 
the energy surface too high. Once settled at a saddle point, it could be one which is not 
associated with the local minimum where the activation process began. It could also be the 
case that we fall on a saddle point where the energy is lower than the starting point, which 
is an immediate indication that we have overlooked at least one adjacent saddle point of the 
local minimum and fallen beyond. Finally, another sign of failure is when relaxation in the 
hyperplane yields a positive \i(xf.), in which case we have reached another local minimum. 
It remains a challenge to be certain that a saddle point falls in the first of the three cat- 
egories mentioned. For the purposes of this study therefore, we will only reject stationary 
configurations if the energy is below that of the initial local minimum or if we are in fact at 
another minimum configuration. 

Comparisons between algorithms (EJ) and (0Q) are done on interstitial and vacancy de- 
fects using the parameters shown in Table [H The results are shown in Table [III Over a 
total number of 1000 successful events, (/) is the average number of force evaluations per 
activation process and r\ is the ratio of successful events to unsuccessful events. It can be 
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observed that the proposed algorithm (pQ) improves performance by a large margin both in 
terms of the average number of force evaluations and the proportion of successful events. 
The elimination of the constant factor \x a in algorithm ([H]) not only makes the algorithm 
more efficient but also more versatile. It may be applied to a wide range of potential energy 
surfaces without the need for parameter manipulation. 
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APPENDIX A: APPENDIX: LOCAL CONVERGENCE ANALYSIS 

In this Appendix, we prove that algorithm (JTj) is locally convergent, even when the 
eigenelement in the direction of negative curvature is approximately computed. 

Let such that V/(x*) = and Ai(x*) < < A 2 (:c*) < • • • < A^(x*). We introduce the 
notation v* = i>i(x*), A* = Ai(x*), = V 2 /(x*), 

Note that 

x k - = z k v\ + y k , hence \x k - x*\ 2 = \z k \ 2 + \y k \ 2 . 

In the analysis below, we often use that z k = 0(\e k \). 

We consider algorithm JT]), where the eigenelement (Xi(x k ),Vi(x k )) is now computed 
approximately. The resulting algorithm, that we analyze below, reads 

x k +i =x k : —vi(x h ) - ^II ei(a:fe) ± V/(x fc ), (Al) 

mm(Ai(a; fc ), -A c ) 

where Ai(xfc) and Vi(x k ) are approximations of X\(x k ) and t>i(xfc): 

~ / \ / \ . T / \ Ai(Xfc) 

ui(ajfc) = vi(^fc) + a*:, Ai(x fc ) = — — — , 

where the errors and /?* are supposed to be small (i.e. ^ 1 an d |/3fe| <C 1). We 
assume that ^(sc*)! = 1. Note that we have made no assumption on the hessian matrix 



H(x k ). Hence, the errors a k and f3 k take into account both a possible approximation in the 
computation of H(x k ) (see ([3]) and (TjJ), and an approximate partial diagonalization of this 
matrix (by a Lanczos or Anoldi algorithm). 

We assume that \i(x*) < — A c and that x k is close enough to x* such that \i(x k ) < — A c 
for all k sufficiently large. We also assume that the error (3k is small enough such that 
Ai(xfc) < —A c for all k sufficiently large. 

It follows from flATj) that 

(Vf(x k ),vx(x k )) 

Zk+i = z k =— — (^(xjb),^) - ^{v^li^u^xV }{x k )) 

M{x k ) 

{Vf{x k ),Vi{x k )+a k ) , a \f/\, *\ 

= z k — — - (1 +/3 k ){v 1 (x k ) + a k: v 1 ) 

Ai(x fe ) 

-m(U^ (xk) xvlVf(x k )) (A2) 

and 

(Vf(x k ),vi(x k )) rr rr vjf( \ 

Vk+x = Vk =-r- — n v *±v 1 (x k ) - ^n„»xn %(a:fe) xv/(x fc ) 

Ai(£fc) 

(V/(^),fi(xfc) . . 

= ry-^ (1 + A)n »±fi(x fc ) - /i t n v *±.\l %l{xk) xVf{x k ). (A3) 

Assuming that / is C 2 {R d ) fl L°°(lR a! ) with bounded first and second derivatives, it holds 

Vf(x k ) = H*(x k - x,) + 0(\x k - x,\ 2 ) = \\z k vl + H*y k + 0(|e fe | 2 ). (A4) 

Besides, using perturbation theory, one obtains 

Ai(x fc ) = \l + (vt,(VH(x*)-e k )vl) + 0(\e k \ 2 ), 

vi(x k ) = vl-U^^-X^l^y'u^iVHix^-e^vl + Oilekl 2 ). (A5) 

From (IA40 and (IA50 . we deduce 

(V/(x fc ),vi(x fc )) = A^ fe + 0(|e fc | 2 ), 

Mx k ),vl) = l + 0(|e fc | 2 ), 

U Mxk) xvl = 0(\e k \) + 0(\a k \), 

U^v^Xk) = 0(\e k \) + 0(\a k \), 

U v ,xU Mxk) xVf(x k ) = H*y k + 0(\e k \ 2 ) + 0(\e k \ \a k \). 
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Inserting these equations in (1A2I) and (1A3[) . we obtain 

(Wf(x k ),vi(x k )) 

z k +i = z k Ai(gfc) C^iW'^i) -^(n^^^-Lfx, vf{x k )) 

\ I 1 +Pfc)(Vl(Xfc) + 0^,^) 

Ai(arjfe) 
(V/(x fc ),fi(x fc )) 



Ai(x fc ) 
(V/(x fc ),fi(x A; )) 



(afc,i£) 



Ai(x fc ) 

= 0(|e fc | 2 ) + 0(|e fc | la*]) + 0(|e*| |/5 fc |) (A6) 
on the one hand, and, on the other hand, 

yk+i = Vk yt~\ u vi ±v i( x k) - /^n^xn^^v/^) 



Ai(x fe ) 
(V/(x fc ),a fc ) 



Ai(x fc ) 
(V/(x fc ),fi(x fc ) + a k 



n„ f ±vi(x fc ) 



Ai(x fc ) 

— {I — HtH*)yk + 0(\e k \ 2 ) + 0(\e k \ \a k \) + 0(|e fc | \(3 k \). (A7) 
As %)k € f \ L and as if* is positive definite on v^ 1 , we get 

-// t #*) K _L|| 2 = max(l - fi t \ 2 ,nAd ~ !)• 
Thus, if yU t < 2/A d , we infer from flA6l) and flATl) that 

\x k+ i - x*| < 7|x fc+ i - x*| + 0(|x fc - x*| 2 ) + 0(|x fc - x*| |ajfc|) + 0(|x fc - x*| |/3 fc |), 

with 7 = ||(J — fitH*) || 2 < 1- Under the assumption that the errors a& and are 
uniformly bounded by a small constant, this proves that algorithm (lAlj) locally converges, 
and that the convergence speed is at least linear. 

In the case when the eigenelement (Xi(x k ),Vi(x k )) is exactly computed, algorithm (1 All) 
reduces to algorithm (CQ). We hence have proved that algorithm (CTJ locally converges, and 
that this convergence is robust with respect to errors in the computations of the lowest 
eigenvalue (and the associated eigenvector) of H(x k ). 

Estimates for the convergence of algorithm ([1]) are readily obtained from (IA6I) and (IA7I) , 
by setting a k = and j3 k = 0. We obtain 

z k+1 = 0(\e k \ 2 ) and y k+ i = (1 - thH*)yk + °(\e k \ 2 ). 
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Note that the convergence for (e.g. in the principal direction of negative curvature) is 
quadratic. If errors are introduced in the computation of the eigenelement (Ai(xfe), Vx(xk)), 
the rate of convergence of z k becomes linear, as can be seen in ( ]A6j) . 
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TABLES 



n SIA n VAC 

Ref. 43 This work Ref. 43 This work 
A c - 0.5 - 0.5 
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HA 0.6 0.6 0.2 0.2 

li a 0.24 - 0.08 

M 18 18 18 18 



TABLE I: Parameters used in implementation. The parameter \iA is taken from studies by Marinica 
et al.—, with \i a j 'ha = 0.4. 



number 






SIA 




VAC 


of defects 




ARTr*^ This work 


ARTr*^ This work 


1 


(/) 


462 


298 


780 


291 




V 


4.6 


4.7 


1.8 


7.9 


2 


(f) 


548 


328 


705 


323 




V 


4.2 


4.4 


2.6 


7.1 


3 


(/) 


691 


320 


667 


321 






2.6 


4.4 


2.8 


7.4 



TABLE II: Comparison of a previous ARTra approach^ and the algorithm presented in this arti- 
cle for interstitial and vacancy defects. The new algorithm reduces the average number of force 
evaluations ((/)) by about 40% and 55% for the self-interstitial atoms (SIA) and vacancies (VAC) 
case respectively. In the case of SIA, the ratio of successful to unsuccessful searches (r/) is almost 
constant. However, in the case of vacancies, r\ is increased by over 260%. 
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